Hierarchical Bayesian Record Linkage Theory

نویسنده

  • Michael D. Larsen
چکیده

In record linkage, or exact file matching, one compares two or more files on a single population for purposes of unduplication or production of an enhanced, merged database. Record linkage has many applications, including in population enumeration efforts, to create databases for epidemiological investigations, and to improve survey sample frames. Latent class and mixture models have been used to implement computerised record linkage of large databases. Probabilities that pairs of records, one record from each of two files, pertain to the same person (a match) or to different people (a nonmatch) are estimated based on model parameters and Bayes’ theorem. In some settings, there is experience with similar record linkage operations that can inform prior opinions concerning model parameters. In this paper, Bayesian record linkage alternatives are developed and compared through simulation. A hierarchical Bayesian model allows parameters to vary by file blocks, which are similar to geographical blocks in census applications. Techniques for incorporating one-to-one matching between files into the likelihood itself and computing posterior distributions of parameters and linkage indicators are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some advances on Bayesian record linkage and inference for linked data

In this paper we review some recent advances on Bayesian methodology for performing Record Linkage and for making inference using the resulting matched units. In particular we frame the record linkage issue into a formal inferential problem and we adapt standard model selection techniques to this context. Although the methodology is quite general, we will focus on the simple multiple regression...

متن کامل

Methods for Record Linkage and Bayesian Networks

Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic models that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are ava...

متن کامل

Bayesian Estimation of Bipartite Matchings for Record Linkage

The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for rec...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

A Hierarchical Graphical Model for Record Linkage

The task of matching co-referent records is known among other names as record linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonably clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005